INDEX
Explanations
abstract concepts and specific nouns
New Auto-Interp
Negative Logits
+\
0.52
قة
0.46
зия
0.46
げ
0.45
жа
0.44
UB
0.44
HR
0.43
цион
0.43
ИА
0.43
按摩
0.43
POSITIVE LOGITS
Encryption
0.53
custom
0.49
deciduous
0.49
solenoid
0.48
Custom
0.48
redist
0.48
Auction
0.48
Redist
0.47
surveys
0.47
stably
0.47
Activations Density 0.001%