INDEX
Explanations
references to characters in a story or media
references to characters in narratives
New Auto-Interp
Negative Logits
agles
-0.74
¿½
-0.69
atories
-0.67
yg
-0.66
rup
-0.65
angular
-0.65
sterdam
-0.64
aband
-0.62
obar
-0.62
roxy
-0.61
POSITIVE LOGITS
istically
1.52
izations
1.38
istics
1.37
acters
1.31
isation
1.19
isations
1.18
izes
1.06
arcs
0.97
arc
0.93
ised
0.93
Activations Density 0.052%