INDEX
Explanations
constructs that initiate statements or questions
New Auto-Interp
Negative Logits
kuk
-0.16
inyin
-0.15
cela
-0.15
ContentLoaded
-0.15
apesh
-0.14
mania
-0.14
endir
-0.14
stable
-0.14
stå
-0.14
ogany
-0.14
POSITIVE LOGITS
far
0.20
far
0.17
ief
0.15
IVA
0.15
ething
0.15
yl
0.15
oner
0.15
iled
0.15
_far
0.15
GES
0.14
Activations Density 0.072%