INDEX
Explanations
words related to actions or events that have a strong impact or influence
verbs that indicate dependencies or impacts in various contexts
New Auto-Interp
Negative Logits
volent
-0.78
nex
-0.64
ozo
-0.61
SW
-0.61
street
-0.61
shore
-0.60
itri
-0.60
ilar
-0.59
zo
-0.59
rone
-0.59
POSITIVE LOGITS
hift
0.98
ettings
0.91
paces
0.90
ometimes
0.77
hement
0.74
hirt
0.71
autions
0.71
ilver
0.70
heet
0.70
olation
0.69
Activations Density 0.427%