INDEX
Explanations
repeated mentions of specific characters or names in interactions
New Auto-Interp
Negative Logits
æĽ
-0.16
ÑĨен
-0.16
bolt
-0.16
kud
-0.15
scrim
-0.14
ertools
-0.14
.Utc
-0.14
/Graphics
-0.14
оÑĢÑĭ
-0.14
fab
-0.14
POSITIVE LOGITS
ames
0.32
agh
0.31
aj
0.31
ajs
0.30
aja
0.29
atan
0.28
akes
0.28
ajas
0.28
avi
0.27
ishi
0.27
Activations Density 0.032%