INDEX
Explanations
phrases related to imminent threats or dangers
references to threats and impending dangers
New Auto-Interp
Negative Logits
ãĤ¼
-0.65
ggies
-0.65
xx
-0.63
arus
-0.63
essen
-0.61
inations
-0.60
ilar
-0.59
earch
-0.58
utsche
-0.57
OH
-0.57
POSITIVE LOGITS
omin
1.08
overhead
1.06
hovering
0.98
looming
0.93
menacing
0.87
above
0.82
haunting
0.82
rily
0.81
challeng
0.78
atop
0.78
Activations Density 0.082%