INDEX
Explanations
quotations in the text
New Auto-Interp
Negative Logits
Ross
-0.78
rental
-0.75
buoy
-0.74
matter
-0.74
editor
-0.73
accomp
-0.73
affiliate
-0.72
rall
-0.71
grasp
-0.71
Nieto
-0.71
POSITIVE LOGITS
classic
1.49
true
1.48
normal
1.47
false
1.47
little
1.46
pure
1.45
every
1.44
double
1.42
traditional
1.41
almost
1.41
Activations Density 1.670%