INDEX
Explanations
phrases indicating admission or confession
New Auto-Interp
Negative Logits
ales
-0.15
reative
-0.15
Pied
-0.14
Fet
-0.13
Ùĩ
-0.13
vais
-0.13
ade
-0.13
eft
-0.13
trak
-0.13
CHR
-0.13
POSITIVE LOGITS
duc
0.17
že
0.16
orative
0.16
admitted
0.16
ders
0.15
ducible
0.15
221
0.15
phere
0.15
admission
0.15
admit
0.15
Activations Density 0.047%