INDEX
Explanations
phrases related to extremes, notably strong emotional reactions or contrasts
negative emotional states and expressions
New Auto-Interp
Negative Logits
yne
-0.70
Persons
-0.62
Families
-0.60
Afric
-0.59
Solitaire
-0.59
Means
-0.55
Syndrome
-0.53
Instit
-0.53
Transcript
-0.53
Mississippi
-0.52
POSITIVE LOGITS
upgr
0.73
uay
0.67
efully
0.67
borgh
0.66
iously
0.63
urous
0.63
ï¸
0.58
ballpark
0.57
fitting
0.56
fide
0.56
Activations Density 0.384%