INDEX
Explanations
statements related to the qualities or properties of subjects
New Auto-Interp
Negative Logits
Kar
-0.16
ril
-0.16
plans
-0.14
Kar
-0.14
Jet
-0.14
avar
-0.14
illery
-0.14
atoms
-0.14
notated
-0.14
ô
-0.13
POSITIVE LOGITS
itra
0.17
elter
0.15
iesel
0.15
θή
0.15
eration
0.14
-tm
0.14
TRGL
0.14
utow
0.14
erator
0.14
strate
0.14
Activations Density 0.079%