INDEX
Explanations
names of individuals or organizations with slight variations
sequences involving proper nouns or names
New Auto-Interp
Negative Logits
ulhu
-0.75
Misc
-0.71
lasses
-0.70
SOS
-0.66
nomine
-0.66
fuck
-0.63
FIGHT
-0.63
LW
-0.63
[+]
-0.61
replay
-0.61
POSITIVE LOGITS
Hart
0.95
Birth
0.92
Marie
0.91
Pont
0.91
Hol
0.89
Mer
0.88
Pal
0.86
Cent
0.86
Web
0.86
Charl
0.85
Activations Density 0.079%