INDEX
Explanations
phrases related to expressing opinions or actions
references to individuals and actions related to governance or public figures
New Auto-Interp
Negative Logits
regulator
-0.67
vere
-0.62
ach
-0.60
icate
-0.59
Kills
-0.57
urable
-0.57
Malays
-0.57
vampire
-0.56
Rory
-0.55
[+
-0.54
POSITIVE LOGITS
Enlarge
0.87
immediately
0.70
invariably
0.67
greeted
0.67
astonished
0.65
unsurprisingly
0.65
leased
0.65
noticed
0.64
ibo
0.64
oss
0.63
Activations Density 0.375%