INDEX
Explanations
phrases indicating visual perception or interaction with people
personal pronouns and references to individual relationships
New Auto-Interp
Negative Logits
ente
-0.71
ģĸ
-0.69
ĵ
-0.68
isma
-0.67
ĨĴ
-0.63
icion
-0.62
©¶æ¥µ
-0.62
idium
-0.61
İ
-0.59
creation
-0.58
POSITIVE LOGITS
unfold
0.85
alive
0.80
perform
0.78
interact
0.77
smiling
0.76
closely
0.74
naked
0.72
slumped
0.71
silhou
0.71
firsthand
0.69
Activations Density 0.190%