INDEX
Explanations
words or phrases related to frequency or repetition
New Auto-Interp
Negative Logits
æ¾
-0.16
compromises
-0.15
å¤ĩ
-0.15
ROOM
-0.15
ÌĢ
-0.14
tracking
-0.14
kèm
-0.14
еж
-0.14
IRECTION
-0.13
room
-0.13
POSITIVE LOGITS
ument
0.19
fr
0.18
-fr
0.17
ombat
0.15
onds
0.15
atz
0.15
eware
0.15
/fr
0.15
kin
0.15
FR
0.15
Activations Density 0.116%