INDEX
Explanations
words related to specific names or titles, potentially in a playful or informal context
New Auto-Interp
Negative Logits
INU
-0.21
IFI
-0.19
SCRI
-0.19
AGMA
-0.17
IOD
-0.17
ICLE
-0.17
PLIC
-0.17
ILI
-0.17
IMIT
-0.16
ICC
-0.16
POSITIVE LOGITS
hi
0.40
bi
0.39
vi
0.34
di
0.33
pi
0.33
li
0.32
ui
0.32
ni
0.30
ki
0.29
ii
0.29
Activations Density 0.032%