INDEX
Explanations
phrases related to conflict and confrontation
repeated special characters or symbols, particularly the "Ŀ"
New Auto-Interp
Negative Logits
obser
-0.75
incorpor
-0.71
disadvant
-0.69
ende
-0.68
incent
-0.67
Palestin
-0.66
contrace
-0.65
mathemat
-0.64
unwanted
-0.63
sacrific
-0.62
POSITIVE LOGITS
ï¸ı
0.95
¯
0.95
ï¸
0.81
ÃĽ
0.77
âĢł
0.76
ttp
0.74
°
0.74
âĻ
0.73
cue
0.72
tra
0.72
Activations Density 0.184%