INDEX
Explanations
-ly followed by positive adjectives
New Auto-Interp
Negative Logits
ल्पन
0.43
usely
0.43
virtualization
0.42
Kiran
0.41
듯
0.40
Bibliography
0.39
듯
0.39
हाला
0.39
fontsize
0.38
kowo
0.38
POSITIVE LOGITS
hablando
0.70
falando
0.61
driven
0.61
oriented
0.60
significant
0.58
relevant
0.57
speaking
0.56
savvy
0.54
speaking
0.54
astute
0.54
Activations Density 0.030%