INDEX
Explanations
instances of the word "that" indicating a focus on statements or claims
New Auto-Interp
Negative Logits
atis
-0.15
ylland
-0.14
orges
-0.14
ões
-0.14
ROP
-0.13
alis
-0.13
kas
-0.13
Ñıн
-0.13
ointed
-0.13
èIJ½ãģ¡
-0.13
POSITIVE LOGITS
iero
0.16
omu
0.14
782
0.14
icap
0.14
eft
0.14
abcdefgh
0.13
ABCDEFGHI
0.13
Warn
0.13
ãģ¨ãģĵãĤį
0.13
imb
0.13
Activations Density 0.009%