INDEX
Explanations
mentions of a specific individual or entity, particularly focused on the name "Ku."
New Auto-Interp
Negative Logits
misleading
-0.43
Holl
-0.40
팎
-0.39
phur
-0.39
Andersson
-0.38
ніципалі
-0.38
Holl
-0.38
assur
-0.37
ith
-0.37
thed
-0.37
POSITIVE LOGITS
ku
2.48
KU
1.89
Ku
1.86
ku
1.80
Ku
1.73
KU
1.48
ку
1.38
쿠
1.20
aku
1.20
ку
1.16
Activations Density 0.009%