INDEX
Explanations
phrases that indicate an increase or addition
New Auto-Interp
Negative Logits
æ©
-0.82
borg
-0.72
ļé
-0.69
Classification
-0.67
è£
-0.66
\\\\\\\\
-0.66
boxing
-0.65
xtap
-0.65
Ń·
-0.65
ãĤ´ãĥ³
-0.64
POSITIVE LOGITS
than
0.84
mature
0.76
realistic
0.75
importantly
0.74
fortunate
0.74
prevalent
0.74
HUD
0.72
educated
0.71
frequent
0.71
interesting
0.71
Activations Density 0.011%