INDEX
Explanations
phrases related to division or categorization
phrases that indicate division or categorization
New Auto-Interp
Negative Logits
tor
-0.70
zai
-0.69
tun
-0.66
JM
-0.64
insulted
-0.64
entimes
-0.64
die
-0.63
onwards
-0.63
heit
-0.61
challeng
-0.59
POSITIVE LOGITS
thirds
0.87
qqa
0.77
ãĤ©
0.76
categories
0.74
submission
0.73
clusions
0.73
perse
0.71
itialized
0.69
Sequ
0.69
İĭ
0.68
Activations Density 0.044%