INDEX
Explanations
phrases related to actions or statements made by specific individuals
statements expressing opinions or claims about individuals
New Auto-Interp
Negative Logits
.(
-0.81
.</
-0.80
.*
-0.79
}.
-0.79
.<
-0.78
.}
-0.75
ãĢĤ
-0.71
.-
-0.70
>.
-0.69
:-
-0.65
POSITIVE LOGITS
,"
0.98
xiety
0.93
%"
0.86
ffield
0.85
"),
0.84
,'"
0.83
[
0.82
ain
0.82
zbollah
0.82
initely
0.82
Activations Density 0.334%