INDEX
Explanations
words related to tendencies or inclinations
repeated phrases indicating tendencies or patterns in behavior
New Auto-Interp
Negative Logits
gur
-0.73
arta
-0.72
lain
-0.67
yz
-0.65
Bunker
-0.60
fil
-0.60
Agenda
-0.59
ft
-0.58
ZA
-0.56
zbek
-0.56
POSITIVE LOGITS
rils
1.29
entious
1.12
ril
1.00
erers
0.89
entimes
0.89
erer
0.87
erest
0.87
uce
0.82
ensical
0.81
eman
0.77
Activations Density 0.014%