INDEX
Explanations
references to individual experiences and perspectives
New Auto-Interp
Negative Logits
orda
-0.17
antium
-0.16
atti
-0.15
lech
-0.14
Interop
-0.14
Mata
-0.14
inant
-0.14
flix
-0.14
landa
-0.14
oll
-0.13
POSITIVE LOGITS
so
0.18
amak
0.17
GTK
0.16
uddle
0.15
gan
0.15
tolik
0.15
tanto
0.15
anto
0.14
tão
0.14
428
0.14
Activations Density 0.140%