INDEX
Explanations
mentions of specific geopolitical events, particularly related to military actions
references to specific nationalities or ethnic groups
New Auto-Interp
Negative Logits
ashtra
-0.78
rina
-0.71
itionally
-0.64
orient
-0.62
von
-0.60
icularly
-0.59
apter
-0.58
Eva
-0.58
ordered
-0.57
stained
-0.57
POSITIVE LOGITS
©¶æ¥µ
0.84
Horizons
0.79
Dialogue
0.73
ij士
0.72
andowski
0.71
dayName
0.67
cffffcc
0.66
iken
0.66
Aram
0.65
Azerb
0.65
Activations Density 0.000%