INDEX
Explanations
numerical scores or ratings
instances of the word "score" and its variations, indicating a focus on scoring or evaluation contexts
New Auto-Interp
Negative Logits
xon
-0.70
UAL
-0.70
agan
-0.68
conn
-0.67
compulsion
-0.64
necessity
-0.61
ulkan
-0.58
LECT
-0.58
flirt
-0.57
lly
-0.57
POSITIVE LOGITS
card
1.02
scores
0.96
cards
0.92
ific
0.87
score
0.85
keeper
0.85
ificant
0.81
heet
0.80
ient
0.79
Scores
0.77
Activations Density 0.017%