INDEX
Explanations
phrases related to potential danger or threats
repeated mentions of the word "rangers" in various contexts
New Auto-Interp
Negative Logits
periodic
-0.70
ately
-0.68
iron
-0.64
balanced
-0.64
Hilbert
-0.63
bal
-0.63
floor
-0.62
¢
-0.61
earliest
-0.60
instantaneous
-0.59
POSITIVE LOGITS
angers
1.34
anger
0.96
auga
0.88
unlaw
0.86
folk
0.84
behav
0.77
extraord
0.76
ocial
0.73
keeper
0.72
chwitz
0.69
Activations Density 0.004%