INDEX
Explanations
numerical values related to amounts of money or measurements
repeated instances of the number "25" in various contexts
New Auto-Interp
Negative Logits
;;;;;;;;;;;;
-0.73
wom
-0.69
tact
-0.64
bang
-0.63
hon
-0.62
clos
-0.61
tack
-0.58
democracy
-0.58
omn
-0.58
thumbs
-0.57
POSITIVE LOGITS
25
3.16
26
2.24
75
2.22
24
2.15
23
2.12
28
2.11
20
2.09
27
2.09
29
2.08
30
2.07
Activations Density 0.015%