INDEX
Explanations
the concept of happiness and references to gas in various contexts
New Auto-Interp
Negative Logits
-0.43
-
-0.43
–
-0.42
“
-0.40
↵
-0.40
co
-0.38
al
-0.38
-0.37
/
-0.37
Pinto
-0.36
POSITIVE LOGITS
ftagPool
0.94
queſta
0.92
ंदीखरीदारी
0.92
<unused79>
0.89
<unused23>
0.88
<unused14>
0.88
<unused28>
0.88
majánló
0.88
[@BOS@]
0.88
<unused8>
0.88
Activations Density 0.299%