INDEX
Explanations
listing concepts or qualities
New Auto-Interp
Negative Logits
което
0.88
(
0.86
selaku
0.84
дает
0.82
дают
0.80
你就
0.80
گرفتن
0.79
System
0.79
ệc
0.78
ifizieren
0.76
POSITIVE LOGITS
symbolism
1.13
rhetoric
1.01
amenities
1.01
delusions
0.98
hardships
0.96
quirks
0.96
misery
0.94
visions
0.94
misinformation
0.93
craftsmanship
0.93
Activations Density 0.101%