INDEX
Explanations
mentions of the word "aid."
references to aid or assistance
New Auto-Interp
Negative Logits
################
-0.63
grill
-0.60
Dresden
-0.59
bang
-0.57
Budapest
-0.56
Harlem
-0.54
Attach
-0.53
Monroe
-0.52
Rim
-0.51
NIGHT
-0.51
POSITIVE LOGITS
doms
0.98
itsch
0.95
sie
0.90
s
0.87
si
0.82
oman
0.81
t
0.81
robe
0.80
taker
0.80
IUM
0.80
Activations Density 0.046%