INDEX
Explanations
phrases related to upward or outward movement
New Auto-Interp
Negative Logits
verifyException
-0.90
ModelExpression
-0.79
IsContent
-0.76
estekak
-0.76
Искәрмәләр
-0.74
مشين
-0.73
mobileqq
-0.73
InjectAttribute
-0.72
PreferredItem
-0.72
חיצוניים
-0.71
POSITIVE LOGITS
up
0.65
away
0.48
out
0.46
off
0.46
2
0.44
↵
0.43
les
0.38
3
0.37
&
0.37
1
0.36
Activations Density 0.226%