INDEX
Explanations
words related to the beginning or initiation of actions or events
New Auto-Interp
Negative Logits
aths
-0.68
obi
-0.68
athed
-0.66
acho
-0.66
ugs
-0.66
rero
-0.63
owl
-0.63
itsch
-0.62
entirety
-0.62
warts
-0.59
POSITIVE LOGITS
anew
1.37
nings
0.94
raining
0.81
behaving
0.80
hostilities
0.74
rek
0.73
airing
0.72
bothering
0.70
unravel
0.70
worrying
0.70
Activations Density 2.865%