INDEX
Explanations
references to intelligence and competence
New Auto-Interp
Negative Logits
unknownFields
-0.47
their
-0.36
its
-0.35
<bos>
-0.34
album
-0.33
few
-0.33
getPost
-0.33
Section
-0.33
estre
-0.32
Pullman
-0.32
POSITIVE LOGITS
Smart
0.88
Intelligent
0.84
Smart
0.84
Intelligent
0.83
Intelligence
0.82
smartest
0.81
smart
0.80
Wise
0.79
SMART
0.79
intelligent
0.79
Activations Density 0.229%