INDEX
Explanations
negative expressions regarding personal beliefs or opinions
New Auto-Interp
Negative Logits
ſelf
-0.93
iſt
-0.88
Houſe
-0.87
Efq
-0.87
་་
-0.85
ſtate
-0.82
cauſe
-0.81
houſe
-0.80
phosa
-0.80
uſe
-0.79
POSITIVE LOGITS
<eos>
0.57
+#+#
0.57
it
0.54
d
0.52
g
0.50
↵↵
0.48
WebVitals
0.47
m
0.46
.
0.46
G
0.46
Activations Density 0.403%