INDEX
Explanations
occurrences of the word "this"
New Auto-Interp
Negative Logits
mare
-0.17
rica
-0.15
sound
-0.15
others
-0.15
Mare
-0.15
Ort
-0.14
osl
-0.14
.bs
-0.14
tring
-0.14
lah
-0.14
POSITIVE LOGITS
Ñħод
0.16
ensch
0.15
/th
0.14
Tactical
0.14
alu
0.14
ODY
0.14
elho
0.14
ffee
0.14
LLU
0.14
ewise
0.14
Activations Density 0.152%