INDEX
Explanations
references to individuals and their roles in a specified context
New Auto-Interp
Negative Logits
alat
-0.17
premise
-0.16
isd
-0.15
awner
-0.15
awning
-0.15
urve
-0.15
Prem
-0.15
vection
-0.14
opes
-0.14
aws
-0.14
POSITIVE LOGITS
oter
0.16
fusc
0.15
sea
0.15
PÅĻÃŃ
0.14
Hubbard
0.14
ILT
0.14
Fritz
0.14
bara
0.14
disen
0.13
Ø«ÛĮر
0.13
Activations Density 0.675%