INDEX
Explanations
phrases starting with "it's not that" followed by a descriptor
phrases indicating a contrast or rejection of a notion or idea
New Auto-Interp
Negative Logits
hips
-0.66
Haz
-0.63
ãĥij
-0.63
ãĥ¡
-0.63
Ö¼
-0.61
×Ļ×
-0.61
×Ļ
-0.61
Hew
-0.60
mi
-0.60
×ŀ
-0.56
POSITIVE LOGITS
interstitial
0.81
cher
0.75
pesky
0.74
same
0.73
ched
0.71
angular
0.70
glomer
0.68
andestine
0.67
emic
0.67
fateful
0.66
Activations Density 0.187%