INDEX
Explanations
instances of the word "hide" or phrases related to hiding or concealment
New Auto-Interp
Negative Logits
ilingual
-0.75
iod
-0.71
bably
-0.67
etheless
-0.64
ipolar
-0.62
Carnage
-0.62
odic
-0.61
relegated
-0.60
theless
-0.59
isance
-0.58
POSITIVE LOGITS
away
1.11
ously
1.08
ous
0.94
Caption
0.83
hiro
0.78
avi
0.75
aways
0.75
cases
0.73
hide
0.72
ance
0.72
Activations Density 0.005%