INDEX
Explanations
adverbs or adjectives indicating success or proficiency
significant and impactful words indicating success or authority
New Auto-Interp
Negative Logits
ieri
-0.84
ivas
-0.73
armac
-0.70
fman
-0.69
apult
-0.66
irez
-0.64
veyard
-0.64
Kul
-0.63
strom
-0.63
к
-0.62
POSITIVE LOGITS
moderator
0.62
ire
0.58
refers
0.57
resumes
0.57
ado
0.56
quotes
0.56
pedia
0.56
quake
0.55
infer
0.55
blindness
0.55
Activations Density 0.832%