INDEX
Explanations
references to inner experiences or emotions
New Auto-Interp
Negative Logits
HasKey
-0.76
Harlow
-0.71
PPC
-0.71
Mawr
-0.71
Gantt
-0.67
هاند
-0.65
posób
-0.64
PhysRevLett
-0.63
tomu
-0.63
Geographie
-0.63
POSITIVE LOGITS
Inside
2.30
inside
2.25
Inside
2.24
inside
2.15
INSIDE
2.15
INSIDE
1.96
insides
1.42
Dentro
1.33
Outside
1.31
OUTSIDE
1.28
Activations Density 0.051%