INDEX
Explanations
verbs that indicate statements or claims made by individuals
New Auto-Interp
Negative Logits
ubb
-0.15
infinity
-0.14
eg
-0.14
Hag
-0.14
ighth
-0.14
Vel
-0.14
fw
-0.14
avier
-0.14
infinity
-0.14
ød
-0.14
POSITIVE LOGITS
ampo
0.17
ContentLoaded
0.15
lest
0.15
leck
0.15
frey
0.15
-NLS
0.15
mastur
0.14
hazi
0.14
uyla
0.14
ŃĶ
0.14
Activations Density 0.064%