INDEX
Explanations
verbs that suggest observation or learning
New Auto-Interp
Negative Logits
ysl
-0.17
irm
-0.15
iar
-0.15
ackers
-0.15
ida
-0.15
_basis
-0.14
oir
-0.14
AMY
-0.14
us
-0.14
tank
-0.14
POSITIVE LOGITS
ton
0.17
owell
0.15
utow
0.15
ittel
0.15
ton
0.15
enÄĽ
0.14
igs
0.14
ufs
0.14
itzer
0.14
nable
0.14
Activations Density 0.000%