INDEX
Explanations
references to things being revealed or exposed
instances of the word "revealing" and related contexts
New Auto-Interp
Negative Logits
applied
-0.64
otor
-0.64
ope
-0.62
nea
-0.61
capital
-0.61
herd
-0.60
management
-0.60
onder
-0.59
etheless
-0.59
zone
-0.58
POSITIVE LOGITS
revealing
0.98
iary
0.86
revelations
0.82
rays
0.79
disclosures
0.77
ly
0.76
ively
0.76
reveals
0.75
ivities
0.75
reve
0.75
Activations Density 0.008%