INDEX
Explanations
articles and determiners preceding nouns
New Auto-Interp
Negative Logits
ime
-0.18
situation
-0.17
ame
-0.15
arse
-0.15
adio
-0.14
heet
-0.14
uncate
-0.14
SEMB
-0.14
вов
-0.14
apon
-0.14
POSITIVE LOGITS
regard
0.25
regards
0.24
emphasis
0.22
stood
0.21
vengeance
0.19
ered
0.19
focus
0.17
emphasis
0.16
ansi
0.16
iston
0.15
Activations Density 0.066%