INDEX
Explanations
references to specific names of individuals, particularly the name "Lu," which it activates for
New Auto-Interp
Negative Logits
lu
-1.16
luk
-0.83
luc
-0.81
ThemeOverlay
-0.75
lug
-0.68
luor
-0.67
lup
-0.65
lü
-0.61
lul
-0.60
Cita
-0.60
POSITIVE LOGITS
Lu
2.17
Lu
1.77
expandindo
0.70
posedge
0.63
ագրություններ
0.58
aarrggbb
0.57
Ecotoxicity
0.56
Зноскі
0.55
feira
0.54
שוליים
0.54
Activations Density 0.001%