INDEX
Explanations
terms related to consequences, values, and abstract concepts
concepts related to political and social issues
New Auto-Interp
Negative Logits
arnaev
-0.56
anwhile
-0.55
owicz
-0.54
Mub
-0.50
ersen
-0.50
Downloadha
-0.50
ewski
-0.49
etsk
-0.48
zl
-0.47
Shed
-0.47
POSITIVE LOGITS
votes
0.54
ravity
0.50
thood
0.48
lessness
0.46
relating
0.45
proportions
0.44
stature
0.42
advertising
0.42
Reviewer
0.42
fame
0.42
Activations Density 1.249%