INDEX
Explanations
phrases related to historical and technical information
New Auto-Interp
Negative Logits
Factor
-0.66
DERR
-0.64
antha
-0.62
multiplication
-0.59
Numbers
-0.59
GPA
-0.57
ategory
-0.57
ultz
-0.56
assian
-0.56
artney
-0.55
POSITIVE LOGITS
bold
0.84
ges
0.83
Machina
0.74
tel
0.73
zel
0.73
berman
0.73
e
0.73
llo
0.72
ÃŁ
0.72
egu
0.71
Activations Density 0.341%