INDEX
Explanations
phrases indicating personal opinions or feelings about situations
First-person pronouns followed by a verb
auxiliary verbs and pronouns
New Auto-Interp
Negative Logits
RegressionTest
-0.65
躇
-0.65
vábbi
-0.62
andaag
-0.59
stranded
-0.58
כז
-0.56
MERCE
-0.55
stdc
-0.55
invokingState
-0.55
揄
-0.55
POSITIVE LOGITS
does
1.65
did
1.65
DID
1.48
do
1.43
DOES
1.36
did
1.36
does
1.32
sí
1.31
确实
1.28
כן
1.25
Activations Density 0.340%