INDEX
Explanations
occurrences of specific words or character sequences related to names or places
New Auto-Interp
Negative Logits
_NC
-0.15
utron
-0.14
hoot
-0.14
leader
-0.14
Trading
-0.14
rally
-0.14
Rally
-0.14
dev
-0.13
rape
-0.13
erap
-0.13
POSITIVE LOGITS
ungen
0.19
ouser
0.18
aker
0.17
PTY
0.17
abeth
0.16
sak
0.16
asser
0.15
bury
0.15
bian
0.15
unger
0.14
Activations Density 0.053%