INDEX
Explanations
phrases related to mental processes and thoughts
words related to thoughts and mental processes
New Auto-Interp
Negative Logits
Firm
-0.67
Moder
-0.66
Pist
-0.63
afort
-0.62
byn
-0.62
degrees
-0.59
ammy
-0.59
Pix
-0.56
Norm
-0.56
hi
-0.55
POSITIVE LOGITS
steps
0.85
hole
0.79
swing
0.67
selves
0.67
INESS
0.65
cavity
0.65
eks
0.63
doorstep
0.63
PATH
0.62
balcony
0.60
Activations Density 0.140%