INDEX
Explanations
mentions of figurines or physical objects in a descriptive context
terms related to figures and their attributes in a broader context
New Auto-Interp
Negative Logits
figure
-0.82
mitt
-0.79
mit
-0.71
vain
-0.70
Fed
-0.66
fix
-0.63
hab
-0.63
clair
-0.61
figured
-0.61
vation
-0.61
POSITIVE LOGITS
inal
1.12
ians
1.06
ian
1.06
inations
1.04
inho
0.98
ine
0.95
anova
0.95
ines
0.93
inals
0.92
ative
0.91
Activations Density 0.086%