INDEX
Explanations
verbs followed by related nouns
New Auto-Interp
Negative Logits
ق
0.91
ه
0.86
ل
0.80
га
0.79
ج
0.78
in
0.76
ли
0.76
在
0.76
ن
0.74
can
0.73
POSITIVE LOGITS
0.86
ンの
0.68
is
0.66
on
0.65
Chihuahua
0.64
of
0.63
Soldaten
0.62
Steelers
0.61
UnwrapRef
0.60
Venezuelan
0.60
Activations Density 0.111%