INDEX
Explanations
references to intelligence and cognitive abilities
New Auto-Interp
Negative Logits
myſelf
-0.80
pleaſure
-0.77
themſelves
-0.76
himſelf
-0.76
itſelf
-0.74
Reſ
-0.73
ſeveral
-0.73
ftagPool
-0.70
leſs
-0.70
ſmall
-0.67
POSITIVE LOGITS
intelligence
0.89
INTEL
0.84
intellectual
0.81
intelligent
0.81
Intel
0.72
Intel
0.71
Gut
0.69
intellect
0.68
Track
0.68
gut
0.68
Activations Density 0.084%