INDEX
Explanations
phrases expressing causality or change
statements indicating change or realization
New Auto-Interp
Negative Logits
ahime
-0.94
edia
-0.81
prus
-0.68
gart
-0.65
ounter
-0.64
[|
-0.64
rique
-0.64
uca
-0.62
racuse
-0.62
rican
-0.61
POSITIVE LOGITS
except
1.03
together
0.99
together
0.89
toget
0.83
facets
0.71
alike
0.71
besides
0.70
hoop
0.68
Ùĩ
0.67
nodd
0.66
Activations Density 0.279%