INDEX
Explanations
mentions of a specific person's last name
references to specific individuals, particularly politicians
New Auto-Interp
Negative Logits
Kinnikuman
-0.74
Alexa
-0.67
GIF
-0.66
Furious
-0.65
Narr
-0.65
Filip
-0.64
é¾įåĸļ士
-0.63
Blow
-0.63
Warp
-0.62
making
-0.61
POSITIVE LOGITS
erc
1.23
therap
1.01
icum
0.98
abulary
0.97
ourse
0.97
eatures
0.97
issions
0.94
satell
0.91
ourses
0.90
ules
0.90
Activations Density 0.004%