INDEX
Explanations
phrases that indicate the existence or accessibility of information or resources
New Auto-Interp
Negative Logits
curfew
-0.73
venge
-0.73
lifelong
-0.65
ditch
-0.62
celebrates
-0.60
ocent
-0.60
salute
-0.59
conscious
-0.58
standby
-0.58
chant
-0.58
POSITIVE LOGITS
HERE
1.30
below
1.21
Below
1.10
BELOW
1.09
Appendix
1.08
Below
1.07
1.05
1.03
here
1.03
0.99
Activations Density 0.165%