INDEX
Explanations
references to written texts, documents, or agreements
references to specific texts and their descriptions
New Auto-Interp
Negative Logits
pload
-0.80
Sniper
-0.78
¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
-0.69
alty
-0.69
Ern
-0.69
ño
-0.67
Accessory
-0.66
rolet
-0.64
Parents
-0.62
Äĩ
-0.62
POSITIVE LOGITS
texts
1.02
ured
1.01
uality
0.97
text
0.93
urally
0.92
book
0.91
books
0.90
ural
0.89
messaging
0.87
messages
0.85
Activations Density 0.013%