INDEX
Explanations
phrases related to political or military entities
references to specific television shows or series
New Auto-Interp
Negative Logits
hammad
-0.81
etically
-0.74
ventus
-0.73
font
-0.73
esis
-0.73
#$
-0.72
colour
-0.71
cious
-0.70
escription
-0.70
veyard
-0.69
POSITIVE LOGITS
anches
0.91
Feldman
0.71
CTV
0.70
iable
0.68
enegger
0.68
aval
0.67
ieri
0.63
Sunny
0.62
converge
0.61
REPORT
0.61
Activations Density 0.029%