INDEX
Explanations
mentions of historical events or political contexts
New Auto-Interp
Negative Logits
forbids
-0.73
buster
-0.71
reproduce
-0.70
arten
-0.70
differs
-0.70
thereof
-0.69
understands
-0.69
ooth
-0.69
identifies
-0.69
differed
-0.69
POSITIVE LOGITS
fact
1.27
plight
1.13
plethora
1.02
myriad
1.00
horrors
0.98
sheer
0.98
countless
0.97
multitude
0.97
dangers
0.93
infamous
0.92
Activations Density 0.634%