INDEX
Explanations
phrases or words indicating the "best" or "top" choice or option
phrases that indicate rankings or comparisons
New Auto-Interp
Negative Logits
factor
-0.68
krit
-0.64
shell
-0.64
hyde
-0.62
ADD
-0.61
ãĥ£
-0.59
taking
-0.59
kamp
-0.59
nsic
-0.59
¿
-0.59
POSITIVE LOGITS
luck
1.05
Worst
0.89
nesota
0.78
owitz
0.76
breed
0.74
intentions
0.73
Nanto
0.70
Luck
0.69
luck
0.69
Practices
0.68
Activations Density 0.074%