INDEX
Explanations
proper nouns related to specific names and locations
mentions of specific individuals or names related to the content
New Auto-Interp
Negative Logits
Flavoring
-0.76
arching
-0.74
eways
-0.70
ickr
-0.69
ocobo
-0.69
via
-0.68
notice
-0.68
pg
-0.68
Hispanic
-0.68
wcs
-0.67
POSITIVE LOGITS
Benson
1.17
hurst
0.92
ModLoader
0.88
sterdam
0.86
Bag
0.78
Briggs
0.75
Henderson
0.70
ua
0.70
Lamb
0.68
offic
0.67
Activations Density 0.009%