INDEX
Explanations
attends to the general concepts or categories from specific instances related in a sentence
New Auto-Interp
Head Attr Weights
0:0.06
1:0.06
2:0.06
3:0.10
4:0.06
5:0.02
6:0.38
7:0.23
Negative Logits
pregunto
-0.28
думаете
-0.28
epresidente
-0.27
+:+
-0.26
ordum
-0.26
deelte
-0.25
pholes
-0.25
PointerException
-0.25
IonicModule
-0.25
pernicus
-0.24
POSITIVE LOGITS
Anſ
0.32
ſelf
0.32
Monfieur
0.32
mste
0.31
either
0.31
ſelves
0.31
astify
0.31
Cæsar
0.31
idak
0.30
either
0.30
Activations Density 0.854%