INDEX
Explanations
names or mentions of a specific publication
words related to inquiries and demands, particularly in a news or editorial context
New Auto-Interp
Negative Logits
ijah
-0.80
brute
-0.77
challeng
-0.75
conflic
-0.73
stalking
-0.72
ajo
-0.71
showc
-0.71
surv
-0.71
lifes
-0.71
deliberations
-0.70
POSITIVE LOGITS
ments
1.10
ment
0.96
rer
0.95
teen
0.93
rers
0.91
mented
0.89
TION
0.87
MENT
0.86
MENTS
0.84
ty
0.80
Activations Density 0.035%