INDEX
Explanations
phrases describing manners or methods of actions
New Auto-Interp
Negative Logits
unar
-0.15
unate
-0.15
лов
-0.15
onga
-0.14
sic
-0.14
undy
-0.14
pheric
-0.14
管
-0.14
airs
-0.13
-ves
-0.13
POSITIVE LOGITS
manner
0.23
fashion
0.21
thức
0.20
isms
0.19
ward
0.18
/place
0.16
Claw
0.15
way
0.15
ways
0.14
Tamb
0.14
Activations Density 0.039%