INDEX
Explanations
phrases indicating failure or disappointing outcomes
negations or expressions of something not being done or absent
New Auto-Interp
Negative Logits
éĥ
-0.68
æĥ
-0.67
rift
-0.65
Pierre
-0.65
itor
-0.63
CRIP
-0.63
++++
-0.62
Rap
-0.61
weap
-0.60
Spectrum
-0.60
POSITIVE LOGITS
epad
1.11
icably
1.08
yet
1.08
icable
1.07
been
1.01
yet
0.96
necessarily
0.96
hin
0.90
gotten
0.85
fared
0.83
Activations Density 0.055%