INDEX
Explanations
contractions of "is" or "has" followed by a descriptive term
phrases that indicate existence or presence
New Auto-Interp
Negative Logits
roit
-0.81
eal
-0.79
ares
-0.71
soever
-0.68
enth
-0.67
approves
-0.64
WARN
-0.62
eals
-0.62
|--
-0.62
ear
-0.59
POSITIVE LOGITS
been
1.12
gotta
1.08
plenty
0.99
always
0.93
nothing
0.92
gonna
0.89
no
0.85
still
0.84
ample
0.83
lots
0.82
Activations Density 0.035%