INDEX
Explanations
phrases indicating familiarity or experience with various situations or topics
expressions of personal characteristics or tendencies
New Auto-Interp
Negative Logits
eday
-0.80
Anthem
-0.76
idas
-0.76
alion
-0.71
etheus
-0.69
yan
-0.68
ciation
-0.68
Pact
-0.65
Kore
-0.65
ankind
-0.64
POSITIVE LOGITS
tink
0.89
mischief
0.84
unpredict
0.78
bullies
0.77
surprises
0.77
fools
0.77
understatement
0.77
patience
0.76
unpredictable
0.76
humor
0.75
Activations Density 0.466%