INDEX
Explanations
news-related and crime-related phrases
New Auto-Interp
Negative Logits
tein
-0.80
Stall
-0.73
orem
-0.69
Schne
-0.67
Dictionary
-0.66
GC
-0.66
aceae
-0.64
Scale
-0.64
Schedule
-0.63
simulator
-0.62
POSITIVE LOGITS
politics
0.80
news
0.79
middle
0.74
breaking
0.71
dp
0.68
NEWS
0.68
ontent
0.67
"]=>
0.67
truth
0.65
inion
0.65
Activations Density 0.138%