INDEX
Explanations
phrases that emphasize the significance of a point or idea
New Auto-Interp
Negative Logits
achus
-0.75
odox
-0.75
ellen
-0.75
imbabwe
-0.71
oute
-0.70
mathemat
-0.70
ussen
-0.70
aeus
-0.69
ultz
-0.69
yss
-0.67
POSITIVE LOGITS
pecul
0.76
distinction
0.73
positives
0.72
facts
0.70
facts
0.70
ties
0.69
caveats
0.69
difference
0.68
ness
0.67
trophies
0.65
Activations Density 0.088%