INDEX
Explanations
explaining or defining terms
New Auto-Interp
Negative Logits
North
0.45
ż
0.43
s
0.43
lumineux
0.42
cache
0.42
j
0.42
sechs
0.42
vibrant
0.42
கள்
0.42
7
0.41
POSITIVE LOGITS
morally
0.48
selfishness
0.47
argu
0.44
নিজের
0.44
legitimacy
0.43
উচিৎ
0.43
abusers
0.43
اپنی
0.42
immoral
0.42
consequences
0.42
Activations Density 0.015%