INDEX
Explanations
numeric values enclosed in square brackets
the presence of numerical sequences or identifiers
New Auto-Interp
Negative Logits
tremend
-0.92
olicy
-0.89
atre
-0.85
uppet
-0.83
isode
-0.81
enhagen
-0.81
orter
-0.78
ossession
-0.78
rint
-0.77
rison
-0.77
POSITIVE LOGITS
384
1.30
6666
1.19
th
0.94
650
0.86
teen
0.83
07
0.82
66666666
0.82
05
0.82
340
0.81
06
0.81
Activations Density 0.035%