INDEX
Explanations
the word "mean" used in various contexts
phrases that indicate qualifications or clarifications about previous statements
New Auto-Interp
Negative Logits
ngth
-0.82
odan
-0.75
hiba
-0.73
sidx
-0.73
albeit
-0.68
figured
-0.67
alez
-0.65
otype
-0.65
utenberg
-0.63
antha
-0.63
POSITIVE LOGITS
anything
1.00
anymore
0.98
anyone
0.90
nor
0.89
anybody
0.80
everyone
0.79
any
0.77
everything
0.77
necessarily
0.76
abandoning
0.74
Activations Density 0.080%