INDEX
Explanations
verbs indicating action or possibility
concepts related to significant issues or dilemmas in society
New Auto-Interp
Negative Logits
Seym
-0.62
Vaugh
-0.59
secretaries
-0.56
Kardash
-0.54
Aud
-0.52
anwhile
-0.52
taxp
-0.47
fitt
-0.47
enegger
-0.47
sidx
-0.47
POSITIVE LOGITS
\">
0.57
olina
0.56
olin
0.54
acular
0.52
Joined
0.51
rogens
0.50
mite
0.50
aves
0.48
SourceFile
0.48
>>>
0.48
Activations Density 0.654%