INDEX
Explanations
phrases related to legal actions and statements
commanding phrases or direct instructions
New Auto-Interp
Negative Logits
pmwiki
-0.55
incent
-0.49
abre
-0.47
stocking
-0.47
Azerb
-0.46
accommodating
-0.45
£ı
-0.45
millenn
-0.45
comr
-0.45
condu
-0.45
POSITIVE LOGITS
ONLY
0.62
doesn
0.59
it
0.57
only
0.56
don
0.54
hasn
0.54
nobody
0.53
hers
0.53
isn
0.52
only
0.52
Activations Density 0.975%