INDEX
Explanations
numbers indicating a quantity or amount, specifically the number 25 with varying degrees of activation
repeated references to the number 25
New Auto-Interp
Negative Logits
oldemort
-0.66
traged
-0.61
Gaal
-0.58
oké
-0.58
aghd
-0.57
.""
-0.57
erent
-0.55
nodd
-0.55
cle
-0.55
EEE
-0.55
POSITIVE LOGITS
25
2.91
26
2.24
35
2.12
27
2.12
30
2.10
23
2.05
20
2.03
24
2.03
29
2.00
28
2.00
Activations Density 0.039%