INDEX
Explanations
phrases related to comparisons or contrasts
New Auto-Interp
Negative Logits
dylib
-0.78
Coffin
-0.68
Femin
-0.66
Alban
-0.65
berra
-0.64
chieve
-0.63
achev
-0.63
Starr
-0.62
Narc
-0.60
Dempsey
-0.60
POSITIVE LOGITS
shouldn
0.84
initely
0.83
naturally
0.81
definitely
0.79
hopefully
0.78
kinda
0.77
chances
0.77
externalToEVAOnly
0.73
accordingly
0.72
yeah
0.71
Activations Density 0.588%