INDEX
Explanations
words or names related to specific characters or entities
New Auto-Interp
Negative Logits
aft
-0.17
chin
-0.16
cen
-0.16
i
-0.15
engin
-0.15
opoulos
-0.15
camp
-0.15
onte
-0.15
d
-0.15
ence
-0.15
POSITIVE LOGITS
erals
0.22
iversit
0.21
ghi
0.21
erable
0.20
iversal
0.20
ächst
0.20
cheon
0.20
iverse
0.19
iversity
0.19
lap
0.19
Activations Density 0.080%