INDEX
Explanations
geographical locations and specific details mentioned in news articles
geographical locations and references to entities, such as businesses or educational institutions
New Auto-Interp
Negative Logits
unlike
-0.54
Siber
-0.52
disabling
-0.50
DERR
-0.49
Berk
-0.48
Pengu
-0.47
ãĤŃ
-0.47
pi
-0.46
differently
-0.46
Kafka
-0.44
POSITIVE LOGITS
appeared
0.93
ensured
0.90
succumbed
0.86
surfaced
0.84
emerged
0.84
collided
0.83
prevailed
0.83
appears
0.83
began
0.82
resulted
0.81
Activations Density 0.880%