INDEX
Explanations
decimal numbers
numerical values followed by a decimal point
New Auto-Interp
Negative Logits
newer
-0.63
extermination
-0.63
tom
-0.62
revision
-0.59
sometime
-0.58
unders
-0.57
illegal
-0.57
privileged
-0.57
travelers
-0.57
young
-0.56
POSITIVE LOGITS
005
1.44
0001
1.39
025
1.37
05
1.34
001
1.34
002
1.33
00000
1.30
075
1.26
004
1.26
003
1.26
Activations Density 0.042%