INDEX
Explanations
proper nouns, particularly names of people
Dil, Til, Vil, Pir, Ein, Rid start words
New Auto-Interp
Negative Logits
fubject
-0.66
Зноскі
-0.65
setViewportView
-0.63
Vibe
-0.60
Sparkle
-0.59
Autoritní
-0.59
Vue
-0.58
bättre
-0.58
aze
-0.58
Algo
-0.57
POSITIVE LOGITS
Til
0.83
Til
0.72
Vil
0.72
Pir
0.68
Pir
0.65
Vil
0.59
Tum
0.57
Dil
0.54
Rid
0.54
Pil
0.54
Activations Density 0.011%