INDEX
Explanations
phrases indicating completion or continuation of an action or state
the word "been" and its various forms
New Auto-Interp
Negative Logits
rones
-0.75
arta
-0.70
Bars
-0.69
achu
-0.66
ives
-0.65
anth
-0.65
fray
-0.64
Lowe
-0.62
Wid
-0.61
regate
-0.61
POSITIVE LOGITS
able
1.14
seen
0.99
taken
0.98
deemed
0.97
treated
0.96
replaced
0.95
subjected
0.95
wolves
0.93
bitten
0.92
shown
0.90
Activations Density 0.156%