INDEX
Explanations
mentions of legal or formal terms and procedures
discussion of significant experiences and their impacts
New Auto-Interp
Negative Logits
ãĤ´ãĥ³
-0.64
+.
-0.59
essage
-0.59
etheless
-0.58
avorite
-0.58
anwhile
-0.58
ometimes
-0.57
respectively
-0.56
eatures
-0.55
destro
-0.55
POSITIVE LOGITS
,'"
1.12
"—
1.12
,"
1.10
%"
1.08
")
1.03
"]
1.02
"),
0.96
"?
0.95
"
0.95
":
0.95
Activations Density 0.625%