INDEX
Explanations
the presence of the word "mind" in various forms and contexts
New Auto-Interp
Negative Logits
Mehran
-0.79
HH
-0.71
sever
-0.66
KING
-0.64
excise
-0.64
ATED
-0.61
Filename
-0.61
Reviewed
-0.60
interstitial
-0.60
TAIN
-0.60
POSITIVE LOGITS
fulness
1.42
ful
1.23
bender
1.09
storms
1.00
iac
0.99
fully
0.95
scape
0.95
spring
0.87
sets
0.86
sworth
0.86
Activations Density 0.006%